home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Power Programmierung
/
Power-Programmierung CD 2 (Tewi)(1994).iso
/
doc
/
mir
/
marc_rec
< prev
next >
Wrap
Text File
|
1992-05-23
|
6KB
|
182 lines
MARC FORMAT RECORDS
===================
Prepared by Doug Lowry
June 13, 1986
OBJECTIVE:
=========
To examine the structure of MARC format records with a
view to writing a preprocessor which will create standard
format records from MARC records.
LIMITATIONS:
===========
What follows is not an exhaustive study. Harvey Martens
has done a preliminary analysis. I have carried it a few
steps further, aided by discussion with Michael xxxxxx of
the National Research Council (613 xxx-xxxx) on June 12.
BASIC RECORD STRUCTURE:
======================
MARC records occur in blocks. Each block is preceded by a
4 byte value; the first two bytes are the high and low order
bytes respectively of the length of the block. The next two
bytes are each null. For example, octal values
026 270 000 000
indicate a block length of 5816 bytes. (It is not
yet clear whether this count includes or excludes the
four bytes for the block length.)
An individual MARC record consists of these components:
1. A 4 byte record length indicator
2. A 24 byte leader
3. A record directory or entries map
4. Control fields and variable fields
5. A group separator character
RECORD LENGTH INDICATOR:
=======================
Each MARC record is preceded by 4 bytes... the high order
byte and the low order byte respectively of the record
length in bytes, then two null bytes. For example, octal
001 366 000 000
indicate a record length of 502 bytes. This count
includes the four byte indicator.
RECORD LEADER:
=============
24 bytes as follows:
1...5 ASCII record length in bytes, EXCLUDING the four byte
record length indicator above.
6 Record status letter (N= new, C= correction,
D= deletion, ...)
7 Type (codes not currently known)
8 Bibliographic category (A= analytic, M= monograph,
S= serial, ...)
11 Indicator count (uncertain... not immediately relevant)
13...17 Seems to be an ASCII count of the number of bytes in
the following directory entries map.
18...24 Uncertain
RECORD DIRECTORY:
================
A record directory consists of a series of ASCII numeric
values:
3 byte field number
4 byte inclusive length of field in bytes
5 byte offset in bytes from beginning of field data
The field numbers in the examples examined so far appear in
numeric order within the directory. A field number may occur
more than once. The location of the data appears in near
random order (possibly the order in which fields were added).
Note the offsets in the following real example:
Field Length Offset
008 0039 00000
009 0032 00284
022 0025 00134
035 0030 00104
088 0007 00229
089 0036 00236
090 0012 00272
100 0014 00159
245 0041 00063
260 0008 00039
260 0008 00047
300 0008 00055
410 0056 00173
RS
A close examination of the directory shows that it is
arithmetically coherent. For example, at offset 00000
above, there is something 39 bytes long. Sure enough,
the next lowest offset is 00039. There are 8 bytes in
that field, and the next offset is 00047, etc.
A directory is terminated by an "RS" or record separator
byte (octal 036).
FIELD CONTENTS:
==============
Fields are of two types...
control fields, numbered 001...009
variable fields, numbered 010...999
Control fields are fixed format and specialized. For
now, control fields other than 009 can be ignored.
Field 008 is usually present, but its contents are
duplicated in 009. We will treat 009 as if it were a
variable field, except that non-printing characters
should be replaced by white space.
Variable fields are essentially free text. Sub fields
may exist within a field. For now, we can indicate new
sub fields by replacing the separators by a newline
symbol in the compressed text.
All fields (except the first) and all sub fields begin
with a "US" (unit separator, octal 037) followed by a
single lower case character. The significance of
different characters is not yet known. We do know that
it is safe to collapse out the "US" byte and the following
byte as white space for preprocessing, and to replace them
by a newline for creating compressed text.
All fields end with an "RS" 036 record separator.
END OF RECORD:
=============
Records are terminated by a single "GS" byte (group
separator, octal 035).
ADDITIONAL NOTES:
================
Consistent order within records would be reasonably assured
if we extract data in field number order per the directory
rather than by actual occurrence within the record. For
preprocessing, this would mean that at least the full record
would be needed in RAM before preprocessing it.
The list of variable field names may be extracted from
"Composite MARC Format", a tabular listing which has been
ordered. Individual organizations may utilize unnamed
fields; until they provide the names, a name such as
"Field 089" could be used.